Using ML to Classify the Unclassified

Thinking

How can we use machine learning (ML) to address open-world classification of unknown (novel) objects?

Faulty Assumptions

This assumption that a trained classifier will have already seen during its training all the unique types of objects it will have to identify, is often sufficient. For example, consider a classifier designed to perform optical character recognition (OCR) of license plate numbers. If the license plate characters always have the same font, then it is merely a matter of providing samples of each possible character (a finite quantity) within the training data set. This classifier, for its limited application, should work quite well.

The Challenges with Arbitrary Classification

But what if the classifier is now tasked with reading any arbitrary road sign? Variations in font would lead to several misclassified characters since the classifier was trained with only one font in mind. The brute-force solution is to add all possible fonts to the training set, but this would rapidly enlarge the set of training data and significantly increase the computational effort required to train the classifier. And every new font created would require another full retraining run to update the classifier. Imagine the further increase in difficulty that would come from expanding to multiple languages and their associated alphabets

Open-World Classification

This difficulty is where we start to venture into open-world classification, where a classifier is expected to manage both the accurate identification of objects present in its training set and the detection and treatment of unknown/novel objects. Complicating matters is the fact that traditional neural network classifiers are designed to produce an output drawn from a finite set of N possible classes, and novel objects are often relegated to one of these known output classes instead of being identified as a new (N+1)^th class.

The growing usage of neural network classifiers in open, real-world environments requires the critical capability to identify and incorporate novel inputs for which the system has not been trained. Mislabeling novel stimuli as known stimuli hinders both the accurate performance of the system and its ability to gather new information.

There are several potential approaches for designing a classification system to function in an open-world environment. If our goal is to design an open-world classifier that can identify novel inputs, how should we proceed? There are several potential solutions that can be summarized into three general approaches: training set design, output layer design, and alternative classification systems.

Training Set Design

If you have N known target classes that you are training for, and you want to add an (N+1)^th class that can represent “anything else”, you can attempt to create a new class within your training data set that novel objects will be more likely to get classified as. This “catch-all” class should be designed such that it contains combinations of features not present in the target classes.

Defining “Anything Else”

Unfortunately, this approach may only be feasible for classification problems involving relatively small feature spaces (for example, Fisher’s Iris data set^[1]). Consider an image classification task for identifying pictures of different types of cats. If we want this classifier to perform well with any picture it is given, what sort of catch-all training set class could be designed to best represent and capture everything that is not a cat? Pictures of other animals? Vehicles? Random noise?

Balancing Inputs

This catch-all class could grow rather large, which introduces another problem: balancing the influence of each class on the outcome. Including more training samples in one class can skew classification results in favor of that class. If we desire a classifier that performs equally well for all training classes, weighting or under/oversampling of classes may be necessary.

Output Layer Design

Perhaps instead of focusing on the design of inputs to the classifier, we instead focus on the methods for how output is generated. A typical neural network is built from several layers: an input layer, multiple hidden layers (including convolution, pooling, and fully connected layers), and an output layer.^[2] The activation function used in the output layer, which maps the outputs from the last hidden layer to a given set of output values, commonly takes the form of a sigmoid or softmax function. The softmax function^[3] is designed to normalize output values to a series of class probabilities that sum to 1. A test input to the classifier is therefore categorized as the class associated with the highest output probability.

Thresholding

Setting an appropriate threshold on this output probability could be one way to create an “unclassified” result: instead of picking the highest probability class, if no class reaches the specified threshold then the input is not assigned a class (i.e., it is novel). While relatively simple, this approach is not without complications. Picking the right threshold would likely be empirically driven by the training data sets used, and the result could be highly sensitive to its value. For example, it has been shown that Deep Neural Networks (DNNs) can be fooled into misclassifying images with extremely high confidence results by changing single pixels in the input.^[4] Even with a high threshold on the output, such high-confidence misclassifications could go uncaught as “novel” inputs.

OpenMax

An alternative to a softmax output layer is OpenMax.^[5] In this design the probability of an input being from an unknown class is estimated using meta-recognition concepts, dropping the softmax requirement that output probabilities for known classes sum to 1 and rejecting inputs that are “far” (in feature-space terms) from known inputs. This has shown better performance than softmax at rejecting unknown inputs with minimal reduction in the correct classification of known inputs.

Deep Open Classification

Further improvements in identification of novel inputs were seen for Deep open classification (DOC) networks employing a 1-vs-rest output layer.^[6] This output layer uses sigmoidal activation functions and tightens their decision boundaries with Gaussian fitting. Such an approach was better able to identify unknown inputs for text and image classification tasks than OpenMax.

Alternative Classification Systems

As an alternative, or in addition to, modifying the neural network classifier layers as illustrated above, additional components can be added to the system, either before, after, or in parallel with the neural network. One approach to dealing with novel inputs is to first perform an outlier analysis on any new inputs to the classifier, using the training data set for comparison. There are many techniques for performing this anomaly detection^[7], such as cluster analysis, local outlier factor (LOF), and isolation forest, to name just a few. These would detect and remove outliers (i.e. novel inputs) before they even make it to the classifier. Then there is just the matter of how to handle them, either by discarding them or incorporating them into a new class to retrain the classifier with.

Multiple Networks

An open-world classification architecture by Shu, et al.^[8] builds on the previous DOC network design by using a set of 3 networks that are jointly trained to identify, classify, and cluster novel inputs. It employs an Open Classification Network (OCN) to classify both known and unknown inputs, a Pairwise Classification Network (PCN) to determine if pairs of inputs are from the same or different classes, an auto-encoder to learn representations from unlabeled examples, and a hierarchical clustering algorithm to group unknown inputs into “hidden” classes using the PCN output as a distance metric. This architecture provides both a means to identify unknown inputs and cluster them into usable classes that can then be added to the training set.

Generative Adversarial Networks

A recent paper from Narayan, et al.^[9] attempts to classify unseen categories (which they refer to as zero-shot classification) by making use of Generative Adversarial Networks (GANs). GANs were primarily designed for unsupervised learning and consist of two networks: a generative network that generates new input data based on a known initial training set, and a discriminative network that attempts to correctly identify the input data (both true data and data created by the generator). The two networks thus contest with one another, with the generator learning how to “fool” the discriminator with synthesized data, and the discriminator learning how to better identify true data from synthesized data. The zero-shot classifier uses a GAN to synthesize unseen class features and improve the discrimination of known from unknown inputs.

Summary

As we come to depend more on AI/ML systems in our everyday lives the need for them to perform robustly and reliably becomes all the more critical. One facet of this is the ability to manage the novel stimuli they will face as they make the move from controlled laboratory environment to the real world. The above techniques are just a few of the more recent attempts in the literature to design deep learning systems with the ability to classify the unclassified and expand their capacity for interacting with and thriving within open-world environments.

References

[1]: https://en.wikipedia.org/wiki/Iris_flower_data_set.

[2]: Rikiya Yamashita, Mizuho Nishio, Richard Kinh Gian Do, Kaori Togashi “Convolutional Neural Networks: An Overview and Application in Radiology”, Insights Imaging 9, 2018, pp. 611–629, https://doi.org/10.1007/s13244-018-0639-9.

[3]: https://en.wikipedia.org/wiki/Softmax_function.

[4]: Jiawei Su, Danilo Vasconcellos Vargas, Kouichi Sakurai, “One Pixel Attack for Fooling Deep Neural Networks”, IEEE Transactions on Evolutionary Computation 23(5), 2019.

[5]: Abhijit Bendale, Terrance E. Boult, “Towards Open Set Deep Networks”, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 1563-1572.

[6]: Lei Shu, Hu Xu, Bing Liu, “DOC: Deep Open Classification of Text Documents”, 2017, https://arxiv.org/abs/1709.08716.

[7]: https://en.wikipedia.org/wiki/Anomaly_detection.

[8]: Lei Shu, Hu Xu, Bing Liu, “Unseen Class Discovery in Open-world Classification”, 2018, https://arxiv.org/abs/1801.05609.

[9]: Sanath Narayan, Akshita Gupta, Fahad Shahbaz Khan, Cees G. M. Snoek, Ling Shao, “Latent Embedding Feedback and Discriminative Features for Zero-Shot Classification”, 2020, https://arxiv.org/abs/2003.07833.

Hellebore

2900 Presidential Drive, Suite 155

Beavercreek, OH 45324

(833) 694 8496

info@hellebore.com